Project Data Description

In this project, we analyze 3,202 stock price and volume data time series traded on the NASDAQ exchange between May 30th and October 30th, 2019. This date range was selected for its distance from significant biological and political disruption to the markets, which can both introduce artificial seasonality and increased random variation into forecasts. Data was sourced as comma-separated values through API from AlphaVantage.

Because of the time constraints involved with directly analyzing each stock, we developed a loop to process through each file, perform a linear model on each stock and where the slope for price is positive with a slope greater than 0.04 and the price is within an affordable range - between 5 and 50 dollars per share - we then select that stock to assess if the spectral density indicates any wandering behavior based on a peak at zero and no additional peaks thereafter. Because of this wandering, the sample ACFs were also expected to damp exponentially, indicative of non-stationary behavior, potentially trending behavior. This method allowed us to identify seven potentially ideal stocks for signal-plus-noise modeling with postive, profitable trending. From these 7 stocks, we selected one and modeled it. We chose this stock to model based on the stationarity of the noise around the signal.

Data Selection

plotts.sample.wge(df$low, trunc=25, arlimits=T)
files = list.files(path='../Time-Series-Stocks', pattern='*.csv')

for(file in files){
  actualFile <- paste0('../Time-Series-Stocks/',file)
  
  df <- read.csv(actualFile, header=T, strip.white=T)

  # run linear regression to get the signal (average).
  t = seq(1, nrow(df),1)
  fit = lm(df$low~t)
  
  # get the frequency values from the spectral density in the Parzen Window (we want them to wander without season; just trend)
  dbz <- plotts.sample.wge(df$low)[4] # plotting sample plots to see the stocks while they process

  # if the linear coefficient (deterministic signal) for the price is positive and the price is between 5 and 50 (affordable):
  if(fit$coefficients[2] > 0.04 && df$low[nrow(df)] > 5 && df$low[nrow(df)] < 50){
    for(i in 1:(length(dbz$dbz)-1)){
      # if the realization is wandering (based on spectral density):
      if(dbz$dbz[i] > dbz$dbz[i+1]){
        write.table(df$symbol, './models_aic_less_than_0.csv', append=T)
      }
    }
  }
}

Candlestick chart for Exploratory Data Analysis (EDA)

Because we have a high, low, open, and close price, we wanted to visually inspect the relationship between these prices across each data point. Through this visual inspection, we noticed differing amounts of variation within each price sets across all data points. As a result, we decided to engineer two new features representing the difference between the high and low and open and close prices. These two new features allowed our multivariate modeling to ingest additional insights into the dynamic relationships betweeen prices in our data.

urlfile = "https://raw.githubusercontent.com/PaulAdams4361/Time-Series-Project/master/NASDAQ_Daily_ACGL.csv"
df <- read_csv(url(urlfile))
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
df <- data.frame(Date=df$times, coredata(df[,2:5]))

fig <- df %>% plot_ly(x = ~Date, type="candlestick",
          open = ~open, close = ~close,
          high = ~high, low = ~low) 
fig <- fig %>% layout(title = "Candlestick Chart for ACGL")

fig

Candlestick Chart for ACGL

After anlayzing the candlestick plot, we decided to use the low price as the target feature of the model. The reason we chose this is because when a stock is trending up, the low price is quick to point this out because the moving average will often rise above the low price, especially for stronger uptrending behavior. This further provides insight into potential investment profitability.

stock_data <- function(x){
  urlfile="https://raw.githubusercontent.com/PaulAdams4361/Time-Series-Project/master/NASDAQ_Daily_ACGL.csv"
  df <- read_csv(url(urlfile))
  df$volume <- (df$volume/10000)
  HiLo <- df$high - df$low
  HiClo <- df$high - df$close
  OpHi <- df$open - df$high
  OpClo <- df$open - df$close
  OpLo <- df$open - df$low
  CloLo <- df$close - df$low
  varianceRatio <- (df$open - df$close) / (df$high - df$low) * 100
  spread <- df$high - df$low
  df <- data.frame(cbind(df, varianceRatio, HiLo, HiClo, OpHi, OpClo, OpLo, CloLo))
  return(df)
}

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
stock_data_trans <- function(x){
  urlfile="https://raw.githubusercontent.com/PaulAdams4361/Time-Series-Project/master/NASDAQ_Daily_ACGL.csv"
  df <- read_csv(url(urlfile))
  df2 <- df[1:(nrow(df)-1),]
  df2$volume <- (df2$volume/10000)
  lowww <- artrans.wge(df$low, phi.tr=1)
  HiLo <- df2$high - lowww
  HiClo <- df2$high - df2$close
  OpHi <- df2$open - df2$high
  OpClo <- df2$open - df2$close
  OpLo <- df2$open - lowww
  CloLo <- df2$close - lowww
  varianceRatio <- (df2$open - df2$close) / (df2$high - lowww) * 100
  spread <- df2$high - lowww
  df2$low <- lowww
  dfnew <- data.frame(cbind(df2, varianceRatio, HiLo, HiClo, OpHi, OpClo, OpLo, CloLo))
  return(dfnew)
}

dftrans <- stock_data_trans()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

In this Signal-Plus-Noise model, we perform a hypothesis test on the linear trend to identify using OLS if the trend is possibly deterministic. After positive confirmation from OLS, we then tested this with the Cochrane-Orcutt AR(1) based hypothesis test, which accounts for serial correlation in determining slope significance. This test also confirmed the signal is a deterministic trend.

Next, we removed the residuals from the trend line and built a model for this data. We then tested the residuals for white noise variance using the Ljung-Box test, which indicated there is not enough evidence to consider the residuals to be more than white noise. Because of the stationarity of the residuals, we were able to estimate a model using the linear slope and adding to it the variation of the residuals.

Forecasting error was measured in terms of Average Squared Error using the last trading week’s data points, for which there were five. The ASE was 0.1654078.

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
# take a sample of the data, analyze
# plotts.sample.wge(df$low, arlimits=T)

####################
# Signal-Plus-Noise
####################
x = df$low
n = length(x)
t = 1:n
fit = lm(x ~ t)
summary(fit) # there appears to be deterministic trend based on OLS; the p-value is significant so reject the null that it is not
## 
## Call:
## lm(formula = x ~ t)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.10389 -0.52398 -0.02693  0.34676  1.35981 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.855438   0.123444  282.36   <2e-16 ***
## t            0.072922   0.002122   34.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6126 on 98 degrees of freedom
## Multiple R-squared:  0.9234, Adjusted R-squared:  0.9226 
## F-statistic:  1181 on 1 and 98 DF,  p-value: < 2.2e-16
# deterministic. The null argues that any present trend is random that will eventually traverse such a pattern that the realization 
# will continue to approximate around the mean. 

# Because OLS is not robust to non-stationarity, we apply the Cochrane-Orcutt test to also test the beta coefficient slope of time
# using an aproach that fits an AR(1) model to the residuals:
cfit = cochrane.orcutt(fit) # to confirm with chochrane-orcutt
summary(cfit) # Cochran-Orcutt also provides a significant p-value. Based on this, we assume the slope is not equal to zero and
## Call:
## lm(formula = x ~ t)
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 35.0451902  0.3933902  89.085 < 2.2e-16 ***
## t            0.0698107  0.0063528  10.989 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.366 on 97 degrees of freedom
## Multiple R-squared:  0.5546 ,  Adjusted R-squared:  0.55
## F-statistic: 120.8 on 1 and 97 DF,  p-value: < 9.97e-19
## 
## Durbin-Watson statistic 
## (original):    0.39519 , p-value: 1.096e-16
## (transformed): 1.49316 , p-value: 3.732e-03
# therefore, there is deterministic that justifies fitting a signal-plus-noise model instead of an ARMA(p,q). However, ARMA(p,q)
# will be fitted later for comparison.

#x.z = x - fit$coefficients[1] - fit$coefficients[2]*t # derive residuals
x.z = fit$residuals # derive residuals (from the regression line)
ar.z = aic.wge(x.z, p=0:6, type="aic") # find a model to use for approximating the residuals. NOTE: (sigmaHAT_a)^2 = 0.1177843
# ar.z$p is the order p (aic selects p=2 where q=0, as does the bic)

# Transform the stock prices by the autoregressive coefficients of the fitted residuals from the linear regression model. 
# Remove phi from residuals to remove serial correlation and allow us to model
y.trans = artrans.wge(df$low, phi.tr=ar.z$phi)
Signal-Plus-Noise Model

Signal-Plus-Noise Model

# also, transform the predictor variable (time) by the autogregressive coefficeints of the fitted residuals as well
t.trans  = artrans.wge(t, phi.tr=ar.z$phi)
Signal-Plus-Noise Model

Signal-Plus-Noise Model

# Finally, regress the newly transformed stock prices (Y-HAT_t) on the transformed time (T-HAT_t)using ordinary least squares
fitco = lm(y.trans ~ t.trans)
summary(fitco) # check to see if the transformed beta coefficient for the slope is still significant
## 
## Call:
## lm(formula = y.trans ~ t.trans)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.97241 -0.18510  0.01497  0.20566  0.70938 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.320641   0.074125  125.74   <2e-16 ***
## t.trans     0.070037   0.004634   15.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3452 on 96 degrees of freedom
## Multiple R-squared:  0.7041, Adjusted R-squared:  0.701 
## F-statistic: 228.4 on 1 and 96 DF,  p-value: < 2.2e-16
# Evaluating the residuals after Cochrane-Orcutt:
plotts.wge(fitco$residuals)
Signal-Plus-Noise Model

Signal-Plus-Noise Model

acf(fitco$residuals) # residuals appear to be white noise
Signal-Plus-Noise Model

Signal-Plus-Noise Model

ljung.wge(fitco$residuals) # there is not enough evidence based on the ljung-box test to reject the null hypothesis. Therefore, 
## Obs -0.02329353 0.03696201 -0.0148023 -0.1264051 0.02979692 0.1170724 0.0121294 -0.04358782 -0.004767952 -0.09739137 0.04153028 -0.05785741 0.01704484 -0.08708388 -0.04190725 -0.00537386 0.008676478 -0.04774231 -0.123505 0.02173201 -0.01302386 -0.09454542 -0.1174163 -0.0818482
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 12.52529
## 
## $df
## [1] 24
## 
## $pval
## [1] 0.9733174
# we cannot assume that the residuals are not white noise.

# Final Signal-Plus-Noise Model: X_t = 34.855438 + 0.072922*t + Z_t, (sigmaHAT_a)^2 = 0.1177843 summary(fit = lm(x ~ t))
# creates the coefficients
ar.z$phi
## [1]  1.0533533 -0.3193699
# (1 - 1.0533533*B + 0.3193699*B^2)*Z_t = a__t. ar.z$phi from AR(2) creates the coeffients and (sigmaHAT_a)^2 = 0.1177843

# BUT, TO REITERATE: Final Signal-Plus-Noise Model is X_t = 34.855438 + 0.072922*t + Z_t, (sigmaHAT_a)^2 = 0.1177843
est_mod = gen.sigplusnoise.wge(100, b0=34.855438, b1 = 0.072922, phi=ar.z$phi, vara=0.1177843)
Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

plotts.sample.wge(est_mod)
Signal-Plus-Noise Model

Signal-Plus-Noise Model

## $autplt
##  [1] 1.0000000 0.9670224 0.9240121 0.8782062 0.8364075 0.8057569 0.7817007
##  [8] 0.7664134 0.7507401 0.7310993 0.7023206 0.6686347 0.6358117 0.6017895
## [15] 0.5740563 0.5491401 0.5291405 0.5080225 0.4802696 0.4513707 0.4183881
## [22] 0.3818700 0.3467923 0.3113063 0.2735769 0.2439660
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
## [15] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28
## [29] 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42
## [43] 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  15.0219793   9.1333830  -0.2100552   2.0635456   2.2789051
##  [6]  -0.2828611  -5.7054121  -2.3753868  -2.5921030  -2.6443502
## [11]   0.7969566  -9.6794350 -14.1530076  -4.5998842 -23.8220689
## [16] -11.8624163 -26.8613176  -7.3879585 -25.2047904 -16.5692265
## [21] -16.4984868  -8.6258058 -16.8488645 -13.6078113 -12.2584196
## [26] -12.4389390 -18.8567345 -14.8497982 -18.3850142 -15.6740479
## [31] -12.5452828 -17.6252338 -18.1376637 -14.6012442 -15.3106928
## [36] -13.7683317 -19.7985591 -19.2722357 -14.1750605 -21.8118328
## [41] -13.6258763 -16.4044055 -14.2755471 -14.7685138 -17.9586679
## [46] -18.3320197 -16.6828267 -19.0885538 -15.9778425 -18.7158022
## 
## $dbz
##  [1]  10.7574154  10.0182789   8.7780497   7.0337195   4.8119736
##  [6]   2.2358663  -0.3522439  -2.3900863  -3.5940209  -4.2869054
## [11]  -4.8635168  -5.4829400  -6.2002778  -7.0841883  -8.2020265
## [16]  -9.5440089 -10.9622112 -12.1995519 -13.0717033 -13.6124408
## [21] -13.9523523 -14.1735054 -14.3405037 -14.5556972 -14.9192432
## [26] -15.4554513 -16.0835616 -16.6578662 -17.0643892 -17.2872279
## [31] -17.3738669 -17.3761611 -17.3452592 -17.3412941 -17.4100460
## [36] -17.5464918 -17.6882505 -17.7576605 -17.7258758 -17.6363053
## [41] -17.5680835 -17.5903472 -17.7441697 -18.0414577 -18.4611031
## [46] -18.9411912 -19.3864256 -19.7098689 -19.8836745 -19.9346329
plotts.sample.wge(df$low) # the estimated model (above) matches to the actual model (here) on both sample ACFs and sample spectral
Signal-Plus-Noise Model

Signal-Plus-Noise Model

## $autplt
##  [1] 1.0000000 0.9542096 0.8987323 0.8524345 0.8100849 0.7813310 0.7600881
##  [8] 0.7371705 0.7123568 0.6828689 0.6476693 0.6153474 0.5864842 0.5600570
## [15] 0.5282542 0.4955941 0.4660246 0.4343222 0.4026243 0.3675647 0.3278754
## [22] 0.2900614 0.2568547 0.2354177 0.2239847 0.2178967
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
## [15] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28
## [29] 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42
## [43] 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  14.2537148   9.9541873   5.8173014  -0.5437278  -0.8235314
##  [6]  -1.3564258   1.4197169  -1.1176687  -7.3533813  -3.8782332
## [11]  -0.5445071  -3.3199576  -5.9250727  -5.4172047  -2.9311702
## [16] -11.1750831  -6.3519213 -18.5926329 -18.7079883  -8.4217804
## [21]  -9.3402655  -9.6710352  -9.7031584  -8.2549030 -14.1830462
## [26] -11.3498316 -12.9743644 -12.1531178 -16.0829501  -9.7837888
## [31] -12.9010407  -9.5737077 -12.8850744 -15.0135304 -10.3417581
## [36] -16.6716427 -13.1487825 -11.0083985 -16.2745553 -18.6304870
## [41] -17.7423184 -13.7665082 -16.0810139 -16.9046379 -13.1224062
## [46] -20.5533807 -18.2518162 -14.7903000 -19.2395748 -19.0265205
## 
## $dbz
##  [1]  10.6247182   9.9050329   8.7000676   7.0111987   4.8698055
##  [6]   2.3951954  -0.1075597  -2.1509930  -3.4458827  -4.1937204
## [11]  -4.7075065  -5.1324715  -5.5423563  -6.0346253  -6.7027509
## [16]  -7.5765092  -8.5979157  -9.6409675 -10.5694523 -11.2993670
## [21] -11.8169224 -12.1628463 -12.4196726 -12.6865263 -13.0293947
## [26] -13.4394270 -13.8345951 -14.1178754 -14.2557593 -14.2960188
## [31] -14.3127453 -14.3564021 -14.4497949 -14.6054534 -14.8345314
## [36] -15.1426627 -15.5243385 -15.9631650 -16.4335783 -16.8977220
## [41] -17.3041080 -17.6055003 -17.7946958 -17.9201902 -18.0549021
## [46] -18.2464892 -18.4904595 -18.7368379 -18.9191710 -18.9861139
# density as well as on partial ACF (below):
pacf(est_mod)
pacf(df$low)

#########################################################################################################
############### Confirmation for repeated model visualization of ACF and Spectral Density ###############
#########################################################################################################
for( i in 1: 5)
{
   SpecDen2 = parzen.wge(gen.sigplusnoise.wge(100, b0=34.855438, b1 = 0.072922, phi=ar.z$phi, vara=0.1177843), plot = T)
   lines(SpecDen2$freq,SpecDen2$pzgram, lwd = 2, col = "red")
}
Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

for( i in 1: 5)
{
   ACF2 = acf(gen.sigplusnoise.wge(100, b0=34.855438, b1 = 0.072922, phi=ar.z$phi, vara=0.1177843), plot = T)
   lines(ACF2$lag ,ACF2$acf, lwd = 2, col = "red")
}
Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

#########################################################################################################
#########################################################################################################
#########################################################################################################

signoise.forecast <- fore.sigplusnoise.wge(df$low, max.p=2, n.ahead=5, limits=T, lastn=T, plot=T)
## 
## Coefficients of Original polynomial:  
## 1.0507 -0.3152 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0507B+0.3152B^2    1.6667+-0.6283i      0.5614       0.0574
##   
## 
Signal-Plus-Noise Model

Signal-Plus-Noise Model

SIGNOISE_ASE = mean((df$low[(nrow(df)-5+1):nrow(df)] - signoise.forecast$f)^2)
SIGNOISE_ASE # 0.133713
## [1] 0.133713

ARIMA(p,d,q) Model

In the ARIMA model, we selected a forecast horizon of five trading days because this timeframe completes a full business week. Models can be fully re-developed on non-trading days. However, unless there are 2 unit roots, ARIMA will not forecast a trend to continue. Therefore, the forecast converge back toward the mean.

visual inspection of the spectral density estimate in the Parzen Window and the sample autocorrelations, it is apparent the data are wandering. Therefore, we decided to difference the data to add stationarity into the model.

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
# take a sample of the data, analyze
# plotts.sample.wge(df$low, arlimits=T)

###########################################################
########################################################### ARMA/ARIMA
###########################################################

factor.wge(df$low) # many unit roots in the data
## 
## Coefficients of Original polynomial:  
## 34.2300 33.9800 34.3624 34.5350 34.7800 35.4000 35.4984 35.3650 34.8500 34.9200 34.8900 34.9600 35.4300 35.4900 35.7200 36.0700 36.1800 36.0977 35.8550 35.2100 35.5450 36.3550 37.0200 37.5800 37.7600 37.6800 37.7700 37.9300 38.3300 38.0200 38.2100 38.1785 38.3000 38.1100 37.8100 37.5500 37.3100 37.1500 37.3000 37.6100 37.6500 38.0200 38.2400 38.4200 38.4100 38.3200 37.5700 37.7600 38.1900 39.1200 39.1800 38.8300 38.8350 38.2700 38.3400 38.9400 39.8000 39.7300 39.5600 39.3800 38.4300 38.6700 38.8300 38.8500 39.2800 39.4300 39.3000 40.1300 41.0000 41.1100 41.0900 40.3000 40.3500 40.4000 40.1100 40.2700 40.4400 40.5200 40.9600 40.7200 40.9200 41.0700 41.2300 41.8900 41.8300 41.9000 41.3300 40.6700 40.5200 41.2400 41.6000 40.7400 40.7500 41.0100 41.1900 41.2700 41.7500 41.2100 41.8400 41.7450 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-35.2232B              0.0284               35.2232       0.0000
## 1-1.5436B+1.0067B^2    0.7667+-0.6368i      1.0033       0.1103
## 1-1.1754B+1.0060B^2    0.5842+-0.8079i      1.0030       0.1504
## 1-0.9656B+1.0055B^2    0.4802+-0.8741i      1.0027       0.1700
## 1-0.1240B+1.0053B^2    0.0617+-0.9954i      1.0027       0.2401
## 1+0.8525B+1.0052B^2   -0.4240+-0.9028i      1.0026       0.3199
## 1+0.6193B+1.0052B^2   -0.3081+-0.9486i      1.0026       0.3000
## 1-1.8153B+1.0052B^2    0.9030+-0.4236i      1.0026       0.0698
## 1+1.4615B+1.0050B^2   -0.7272+-0.6829i      1.0025       0.3800
## 1+1.1803B+1.0049B^2   -0.5872+-0.8064i      1.0025       0.3502
## 1-1.4636B+1.0046B^2    0.7284+-0.6817i      1.0023       0.1197
## 1-0.6200B+1.0046B^2    0.3086+-0.9488i      1.0023       0.2000
## 1-1.3722B+1.0044B^2    0.6831+-0.7273i      1.0022       0.1300
## 1-1.9884B+1.0044B^2    0.9899+-0.1258i      1.0022       0.0201
## 1-1.6208B+1.0044B^2    0.8069+-0.5870i      1.0022       0.1001
## 1-0.2510B+1.0044B^2    0.1249+-0.9900i      1.0022       0.2300
## 1-0.4977B+1.0043B^2    0.2478+-0.9666i      1.0022       0.2101
## 1-1.9688B+1.0043B^2    0.9802+-0.1870i      1.0022       0.0300
## 1-1.7551B+1.0043B^2    0.8738+-0.4818i      1.0021       0.0802
## 1-1.2792B+1.0042B^2    0.6369+-0.7682i      1.0021       0.1398
## 1-0.3763B+1.0042B^2    0.1873+-0.9802i      1.0021       0.2199
## 1+0.1270B+1.0040B^2   -0.0633+-0.9960i      1.0020       0.2601
## 1+0.3752B+1.0037B^2   -0.1869+-0.9805i      1.0019       0.2800
## 1-1.8631B+1.0037B^2    0.9281+-0.3674i      1.0019       0.0600
## 1+1.3714B+1.0037B^2   -0.6832+-0.7277i      1.0018       0.3700
## 1-1.9997B+1.0037B^2    0.9962+-0.0630i      1.0018       0.0100
## 1+0.9658B+1.0036B^2   -0.4812+-0.8746i      1.0018       0.3301
## 1+0.7371B+1.0034B^2   -0.3673+-0.9283i      1.0017       0.3100
## 1+0.2521B+1.0033B^2   -0.1257+-0.9904i      1.0016       0.2701
## 1+1.9060B+1.0032B^2   -0.9499+-0.3073i      1.0016       0.4502
## 1+1.7548B+1.0032B^2   -0.8746+-0.4816i      1.0016       0.4199
## 1+1.9874B+1.0032B^2   -0.9906+-0.1250i      1.0016       0.4800
## 1-1.9042B+1.0028B^2    0.9494+-0.3095i      1.0014       0.0501
## 1-1.0736B+1.0028B^2    0.5353+-0.8430i      1.0014       0.1600
## 1+1.0729B+1.0026B^2   -0.5351+-0.8433i      1.0013       0.3400
## 1-1.9398B+1.0025B^2    0.9675+-0.2479i      1.0012       0.0399
## 1-1.6900B+1.0025B^2    0.8429+-0.5357i      1.0012       0.0901
## 1+0.0017B+1.0024B^2   -8e-04+-0.9988i      1.0012       0.2501
## 1+1.8123B+1.0022B^2   -0.9041+-0.4247i      1.0011       0.4301
## 1+1.5436B+1.0022B^2   -0.7701+-0.6362i      1.0011       0.3901
## 1+1.8617B+1.0021B^2   -0.9289+-0.3675i      1.0011       0.4400
## 1+1.2774B+1.0018B^2   -0.6375+-0.7692i      1.0009       0.3601
## 1+1.6906B+1.0018B^2   -0.8438+-0.5350i      1.0009       0.4101
## 1+0.4969B+1.0018B^2   -0.2480+-0.9678i      1.0009       0.2899
## 1+1.6195B+1.0016B^2   -0.8085+-0.5872i      1.0008       0.4000
## 1+1.0008B             -0.9992               1.0008       0.5000
## 1+1.9975B+1.0015B^2   -0.9973+-0.0628i      1.0007       0.4900
## 1+1.9662B+1.0014B^2   -0.9817+-0.1866i      1.0007       0.4701
## 1+1.9383B+1.0011B^2   -0.9681+-0.2483i      1.0005       0.4600
## 1-0.7369B+1.0009B^2    0.3681+-0.9293i      1.0004       0.1900
## 1-0.8524B+0.9998B^2    0.4263+-0.9047i      0.9999       0.1799
##   
## 
aic5.wge(df$low, type="aic")
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##       p    q        aic
## 16    5    0  -1.910377
## 5     1    1  -1.897324
## 7     2    0  -1.893150
## 17    5    1  -1.891381
## 13    4    0  -1.888234
est.arma.wge(df$low, p=5, q=0) # the factor table for df$low provides one (1-B) represented as (1-0.9956B), a close-enough 
## 
## Coefficients of Original polynomial:  
## 1.1877 -0.2735 -0.0237 -0.1048 0.2089 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-0.9956B              1.0044               0.9956       0.0000
## 1-1.0272B+0.5751B^2    0.8930+-0.9702i      0.7584       0.1316
## 1+0.8350B+0.3648B^2   -1.1444+-1.1964i      0.6040       0.3715
##   
## 
## $phi
## [1]  1.1877481 -0.2735138 -0.0236647 -0.1047691  0.2088935
## 
## $theta
## [1] 0
## 
## $res
##   [1] -0.165360679 -0.248592128  0.363885549  0.015565862  0.159446632
##   [6]  0.554243110  0.079621732 -0.057074013 -0.382429141  0.278880923
##  [11] -0.097482597 -0.019422817  0.334796257 -0.030095363  0.241082142
##  [16]  0.359033945  0.152269066 -0.051404314 -0.146420209 -0.534437138
##  [21]  0.436742881  0.635086671  0.406138563  0.388882305  0.274632352
##  [26]  0.164626052  0.312598317  0.267836037  0.402397484 -0.382792087
##  [31]  0.314742389 -0.019790513  0.192239782 -0.262226805 -0.219405077
##  [36] -0.215162983 -0.213589314 -0.212029435  0.064473180  0.182297318
##  [41] -0.079496154  0.364704285  0.212652401  0.234638887  0.019206979
##  [46]  0.025931985 -0.669887571  0.358972500  0.317386679  0.763533802
##  [51] -0.218741112 -0.198886132  0.265605639 -0.392032019  0.154145806
##  [56]  0.467383695  0.694146794 -0.291790777 -0.003869088  0.067492657
##  [61] -0.752101252  0.376021762 -0.016323724 -0.126548116  0.267208878
##  [66]  0.139327188 -0.084122195  0.920160373  0.813195166  0.039685491
##  [71]  0.101677889 -0.499777879  0.403443472  0.007294153 -0.372186338
##  [76]  0.068529859  0.140618696  0.050394737  0.404830882 -0.254532034
##  [81]  0.337154539  0.167243706  0.227491623  0.626154227 -0.099359380
##  [86]  0.200147619 -0.468357850 -0.397890716 -0.062379121  0.661642566
##  [91]  0.035477245 -1.008810154  0.260311114  0.388498810  0.129381352
##  [96] -0.098366118  0.522695415 -0.536131917  0.832977988 -0.175861021
## 
## $avar
## [1] 0.131286
## 
## $aic
## [1] -1.910377
## 
## $aicc
## [1] -0.878203
## 
## $bic
## [1] -1.754067
## 
## $se.phi
## [1] 0.009628394 0.024824361 0.027846609 0.027616707 0.010504271
## 
## $se.theta
## [1] 0
# approximation. Therefore, we difference the data once. Preliminary evidence suggesting differencing is useful is based 
# on the specified wandering and the damping sample autocorrelations. When using an overfit table, there was a factor close
# to (1-b)^2, but it was not very significant (third most significant using estimated_arma <- est.arma.wge(dftrans, p=15, q=1))
# so we decided to only use a first difference.

dftrans <- artrans.wge(df$low, phi.tr=1)
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

aic5.wge(dftrans, type="aic") # aic = -2.001944
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##       p    q        aic
## 17    5    1  -2.001944
## 13    4    0  -1.985397
## 14    4    1  -1.966194
## 16    5    0  -1.966140
## 10    3    0  -1.934893
estimated_arma <- est.arma.wge(dftrans, p=5, q=0)
## 
## Coefficients of Original polynomial:  
## 0.1227 -0.1183 -0.1454 -0.2587 -0.0292 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0644B+0.6446B^2    0.8257+-0.9325i      0.8028       0.1347
## 1+0.8220B+0.3777B^2   -1.0882+-1.2098i      0.6146       0.3666
## 1+0.1198B             -8.3479               0.1198       0.5000
##   
## 
estimated_arma$phi
## [1]  0.12269667 -0.11827203 -0.14543411 -0.25874556 -0.02916165
#estimated_arma <- est.arma.wge(dftrans, p=15, q=1)
ljung.wge(estimated_arma$res, p=5, q=0) # Indication there is no serial correlation in the residuals of the model; this is a good fit.
## Obs -0.001755173 -0.0006645789 -0.03189077 -0.0558602 -0.03745613 -0.006650248 -0.07239689 -0.1447362 -0.01711384 -0.08391131 0.05274902 -0.02483137 0.045087 -0.06707539 -0.004246384 0.07628534 0.08560867 -0.0009574519 -0.1290798 0.02213024 0.01599169 -0.07501175 -0.1093225 -0.02137378
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 11.57463
## 
## $df
## [1] 19
## 
## $pval
## [1] 0.9029941
estimated_arma$avar # 0.1240151
## [1] 0.1240151
mean(df$low) # 38.53802
## [1] 38.53802
# (1-B)(1-0.12269667B+0.11827203B^2+0.14543411B^3+0.25874556B^4-0.02916165B^5)*(X_t + 38.53802) = a_t, (sigma-hat_a)^2 = 0.1240151
# (1-1.12269667B+0.00442464B^2+0.26370614B^3+0.11331145B^4-0.28790721B^5+0.02916165B^6)*(X_t + 38.53802) = a_t, (sigma-hat_a)^2 = 0.1240151

######
###### Plotting residuals

# the residuals of the model do not appear correlated. The sample autocorrelation and partial autocorrelation plots indicate 
# stationarity across all lags with all lags measured within the 5% limits
plotts.sample.wge(estimated_arma$res)
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

## $autplt
##  [1]  1.0000000000 -0.0017551730 -0.0006645789 -0.0318907670 -0.0558601955
##  [6] -0.0374561274 -0.0066502482 -0.0723968930 -0.1447362269 -0.0171138352
## [11] -0.0839113138  0.0527490212 -0.0248313665  0.0450869990 -0.0670753909
## [16] -0.0042463842  0.0762853413  0.0856086721 -0.0009574519 -0.1290798025
## [21]  0.0221302437  0.0159916891 -0.0750117483 -0.1093225110 -0.0213737804
## [26]  0.0330170662
## 
## $freq
##  [1] 0.01010101 0.02020202 0.03030303 0.04040404 0.05050505 0.06060606
##  [7] 0.07070707 0.08080808 0.09090909 0.10101010 0.11111111 0.12121212
## [13] 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717 0.18181818
## [19] 0.19191919 0.20202020 0.21212121 0.22222222 0.23232323 0.24242424
## [25] 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929 0.30303030
## [31] 0.31313131 0.32323232 0.33333333 0.34343434 0.35353535 0.36363636
## [37] 0.37373737 0.38383838 0.39393939 0.40404040 0.41414141 0.42424242
## [43] 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747 0.48484848
## [49] 0.49494949
## 
## $db
##  [1]  -8.833455090   0.263787556 -11.498115493   2.117874832   2.272952545
##  [6]  -9.309718287   5.018377394   0.289213348  -1.892315270 -11.308615271
## [11]   3.725194391  -0.611197010 -12.632295451  -0.590314051   3.046760336
## [16] -12.371079560   0.004883422   2.510605713   4.337557356  -3.178637232
## [21] -17.271334056  -1.077447786  -2.257551049   3.231804970  -1.965461358
## [26]  -3.043185676  -1.825761971  -7.420727112   2.312103352  -1.619957827
## [31]  -7.841066943   5.001502888  -2.103213653   0.521620232   2.860045258
## [36]  -2.911236603  -9.089689687   4.807535654  -1.618776046 -13.746143879
## [41]  -6.930503276   3.442666207  -8.907109791  -0.989371943   3.916436743
## [46] -14.309350664   0.883844166   0.342804251  -6.422777409
## 
## $dbz
##  [1] -2.1421139616 -1.5268677158 -0.7764179465 -0.0840477881  0.4450188370
##  [6]  0.7709948539  0.8934228747  0.8371033041  0.6469486934  0.3863575660
## [11]  0.1329634098 -0.0354873769 -0.0674530562  0.0372285907  0.2260959001
## [16]  0.4205848653  0.5490294530  0.5673324474  0.4653779820  0.2642966345
## [21]  0.0072262109 -0.2555822359 -0.4825978402 -0.6484742078 -0.7390119560
## [26] -0.7427274437 -0.6522279590 -0.4757583702 -0.2446363728 -0.0055391248
## [31]  0.1965782737  0.3316246748  0.3873844148  0.3649753175  0.2727409570
## [36]  0.1240460268 -0.0586583169 -0.2366934318 -0.3546826477 -0.3582603902
## [41] -0.2277399300 -0.0007442644  0.2413219942  0.4108729200  0.4464568416
## [46]  0.3298807324  0.0947313908 -0.1721868052 -0.3517455314
par(mfrow=c(3,1))
plot(estimated_arma$res, ylab="ARIMA Residuals", xlab = "Trading Days", main = "ARIMA(5,1,1) Model Residuals over Time", lwd=2)
abline(h=mean(estimated_arma$res), col="blue")
acf(estimated_arma$res) # this seems to be white noise; all lags are within the 5% limits
pacf(estimated_arma$res) # this seems to be white noise; all lags are within the 5% limits
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

aic5.wge(estimted_arma$res, type="aic") # White noise is the top model selected for the residuals, by AIC (ARMA(0,0))
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Error in aic calculation at 0 0 
## Error in aic calculation at 0 1 
## Error in aic calculation at 0 2 
## Error in aic calculation at 1 0 
## Error in aic calculation at 1 1 
## Error in aic calculation at 1 2 
## Error in aic calculation at 2 0 
## Error in aic calculation at 2 1 
## Error in aic calculation at 2 2 
## Error in aic calculation at 3 0 
## Error in aic calculation at 3 1 
## Error in aic calculation at 3 2 
## Error in aic calculation at 4 0 
## Error in aic calculation at 4 1 
## Error in aic calculation at 4 2 
## Error in aic calculation at 5 0 
## Error in aic calculation at 5 1 
## Error in aic calculation at 5 2 
## Five Smallest Values of  aic
##      p    q        aic
## 1    0    0     999999
## 2    0    1     999999
## 3    0    2     999999
## 4    1    0     999999
## 5    1    1     999999
######
######

######
###### Compare estimated model to differenced data

# We generated a model of 99 data points using the ARMA(5,1) identified by aic5 with a variance equal to that estimated from the
# differenced data, which is 0.1172604 (select to see origin highlighted above)
est_mod <- gen.aruma.wge(99, phi=estimated_arma$phi, theta=estimated_arma$theta, d=1, vara=estimated_arma$avar, sn=40)
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

plotts.sample.wge(est_mod, arlimits=T) # estimated model
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

## $autplt
##  [1] 1.0000000 0.9319568 0.8616508 0.8115232 0.7837261 0.7798462 0.7686385
##  [8] 0.7540206 0.7238039 0.6950334 0.6636351 0.6243891 0.5993385 0.5769297
## [15] 0.5519124 0.5366042 0.5296241 0.5194643 0.4818245 0.4413453 0.3916907
## [22] 0.3539872 0.3181363 0.2818305 0.2584268 0.2245354
## 
## $freq
##  [1] 0.01010101 0.02020202 0.03030303 0.04040404 0.05050505 0.06060606
##  [7] 0.07070707 0.08080808 0.09090909 0.10101010 0.11111111 0.12121212
## [13] 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717 0.18181818
## [19] 0.19191919 0.20202020 0.21212121 0.22222222 0.23232323 0.24242424
## [25] 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929 0.30303030
## [31] 0.31313131 0.32323232 0.33333333 0.34343434 0.35353535 0.36363636
## [37] 0.37373737 0.38383838 0.39393939 0.40404040 0.41414141 0.42424242
## [43] 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747 0.48484848
## [49] 0.49494949
## 
## $db
##  [1]  15.4662467   6.4612009   2.8156959  -6.5831271  -0.3491412
##  [6]   0.6467012 -23.4462233 -11.9860652  -4.2541616  -2.3339334
## [11]  -4.9973000  -0.5165929  -8.4672734  -5.9260513 -12.5070338
## [16]  -1.5329287  -9.4007029  -4.2193607 -12.8919484 -17.0150048
## [21]  -6.9836378 -14.0339519  -5.8623391 -15.3945590 -11.2233206
## [26] -24.0796617 -10.8749229 -18.3577480 -11.7127985  -7.5363064
## [31] -14.6647063 -14.4603439 -11.9075717 -18.9642429 -19.7115192
## [36] -13.9330289 -14.7859997 -12.3169328 -15.2897638 -15.2611997
## [41]  -7.5246856 -25.4508518 -13.1571835 -18.5429171 -14.5183414
## [46] -11.9244393 -15.4696702 -20.2139032 -16.6394603
## 
## $dbz
##  [1]  10.3894114   9.7014613   8.5461122   6.9166900   4.8252742
##  [6]   2.3519955  -0.2434927  -2.4200960  -3.6721480  -4.1238920
## [11]  -4.2332485  -4.2725076  -4.3507834  -4.5391300  -4.8913942
## [16]  -5.4201345  -6.0871771  -6.8234956  -7.5683596  -8.2978154
## [21]  -9.0157797  -9.7227007 -10.4014424 -11.0315750 -11.6029838
## [26] -12.1026903 -12.4960138 -12.7466330 -12.8695657 -12.9469286
## [31] -13.0762896 -13.3067957 -13.6115217 -13.9016587 -14.0770531
## [36] -14.0895034 -13.9687564 -13.7936668 -13.6492815 -13.6034164
## [41] -13.6994736 -13.9536233 -14.3531696 -14.8590436 -15.4153465
## [46] -15.9631305 -16.4485233 -16.8195565 -17.0238291
plotts.sample.wge(df$low, arlimits=T) # estimated model
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

## $autplt
##  [1] 1.0000000 0.9542096 0.8987323 0.8524345 0.8100849 0.7813310 0.7600881
##  [8] 0.7371705 0.7123568 0.6828689 0.6476693 0.6153474 0.5864842 0.5600570
## [15] 0.5282542 0.4955941 0.4660246 0.4343222 0.4026243 0.3675647 0.3278754
## [22] 0.2900614 0.2568547 0.2354177 0.2239847 0.2178967
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
## [15] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28
## [29] 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42
## [43] 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  14.2537148   9.9541873   5.8173014  -0.5437278  -0.8235314
##  [6]  -1.3564258   1.4197169  -1.1176687  -7.3533813  -3.8782332
## [11]  -0.5445071  -3.3199576  -5.9250727  -5.4172047  -2.9311702
## [16] -11.1750831  -6.3519213 -18.5926329 -18.7079883  -8.4217804
## [21]  -9.3402655  -9.6710352  -9.7031584  -8.2549030 -14.1830462
## [26] -11.3498316 -12.9743644 -12.1531178 -16.0829501  -9.7837888
## [31] -12.9010407  -9.5737077 -12.8850744 -15.0135304 -10.3417581
## [36] -16.6716427 -13.1487825 -11.0083985 -16.2745553 -18.6304870
## [41] -17.7423184 -13.7665082 -16.0810139 -16.9046379 -13.1224062
## [46] -20.5533807 -18.2518162 -14.7903000 -19.2395748 -19.0265205
## 
## $dbz
##  [1]  10.6247182   9.9050329   8.7000676   7.0111987   4.8698055
##  [6]   2.3951954  -0.1075597  -2.1509930  -3.4458827  -4.1937204
## [11]  -4.7075065  -5.1324715  -5.5423563  -6.0346253  -6.7027509
## [16]  -7.5765092  -8.5979157  -9.6409675 -10.5694523 -11.2993670
## [21] -11.8169224 -12.1628463 -12.4196726 -12.6865263 -13.0293947
## [26] -13.4394270 -13.8345951 -14.1178754 -14.2557593 -14.2960188
## [31] -14.3127453 -14.3564021 -14.4497949 -14.6054534 -14.8345314
## [36] -15.1426627 -15.5243385 -15.9631650 -16.4335783 -16.8977220
## [41] -17.3041080 -17.6055003 -17.7946958 -17.9201902 -18.0549021
## [46] -18.2464892 -18.4904595 -18.7368379 -18.9191710 -18.9861139
#################################################################################################################################################
############### Confirmation that repeated estimated model visualizations of ACF and Spectral Density match that of original data ###############
#################################################################################################################################################

for( i in 1:5)
{
   SpecDen2 = parzen.wge(gen.aruma.wge(99, phi=estimated_arma$phi, theta=estimated_arma$theta, d=1, vara=estimated_arma$avar), plot = T)
   lines(SpecDen2$freq,SpecDen2$pzgram, lwd = 2, col = "red")
}
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

for( i in 1:5)
{
   ACF2 = acf(gen.aruma.wge(99, phi=estimated_arma$phi, theta=estimated_arma$theta, d=1, vara=estimated_arma$avar), plot = T)
   lines(ACF2$lag ,ACF2$acf, lwd = 2, col = "red")
}
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

#########################################################################################################
#########################################################################################################
#########################################################################################################

# after comparing sample spectral density and ACF plots, we comapred the AIC-estimated models to compare identified models
aic5.wge(dftrans)
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##       p    q        aic
## 17    5    1  -2.001944
## 13    4    0  -1.985397
## 14    4    1  -1.966194
## 16    5    0  -1.966140
## 10    3    0  -1.934893
aic5.wge(est_mod2) # ARMA(5,0) and ARMA(5,1) are top in both models, indicating a potentially useful model..
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Error in aic calculation at 0 0 
## Error in aic calculation at 0 1 
## Error in aic calculation at 0 2 
## Error in aic calculation at 1 0 
## Error in aic calculation at 1 1 
## Error in aic calculation at 1 2 
## Error in aic calculation at 2 0 
## Error in aic calculation at 2 1 
## Error in aic calculation at 2 2 
## Error in aic calculation at 3 0 
## Error in aic calculation at 3 1 
## Error in aic calculation at 3 2 
## Error in aic calculation at 4 0 
## Error in aic calculation at 4 1 
## Error in aic calculation at 4 2 
## Error in aic calculation at 5 0 
## Error in aic calculation at 5 1 
## Error in aic calculation at 5 2 
## Five Smallest Values of  aic
##      p    q        aic
## 1    0    0     999999
## 2    0    1     999999
## 3    0    2     999999
## 4    1    0     999999
## 5    1    1     999999
######
######

######
###### Forecast and Measure Average Squared Errors (ASE)

# Model Forecast and ASE
# Forecasting the differenced data using the parameters estimated from it (we do not apply a d=1)
nonseasonal.forecast <- fore.aruma.wge(df$low, phi=estimated_arma$phi, theta=estimated_arma$theta, d=1,n.ahead=5, lastn=T)
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA_ASE = mean((df$low[(nrow(df)-5+1):nrow(df)] - nonseasonal.forecast$f)^2)
ARIMA_ASE# 0.2026529
## [1] 0.2026529
######
######

Multivariate Regression Time Series Modeling

Following the univariate modeling of this time series, we decided to enhance our model with multivariate factors, both additive and multiplicative. Furthermore, based on the candlestick plot we created for exploratory data analysis, we were interested in exploring the relationship of the ranges of open to close and high to low prices as related to the low price. Therefore, we engineered two features to capture these two ranges for each data point. Additionally, we added an individual time component.

Initially, we performed analysis on lag behavior and concluded there is not enough significant evidence in the correlation structures between our variables to indicate lagging of variables should be provided. From this point we continued our analysis.

Note that while not included in our presentation, and performed only as an aside, we performed multivariate regression with the same variables to gain insight into the improvements in forecasting accuracy gained from developing the Vector Autoregressive and Neural Network with Multi-Layered Perceptrons provides. As expected, the improvements were profound. For comparison, our best Neural Network forecasting scored an Average Squared Error of 0.05711815 whereas our best Vector Autoregressive model scored 0.1703138 and our best Multivariate Regression model scored

Vector Autoregressive Modeling

Vector Autoregressive Modeling without Trend

The vector Autoregressive models used take all the training data - including for the exogenous variables - to bulid a model. We forecasted 5 data points ahead from this trainin set of 95 out of 100 data points and then generated the overall ASE and the 5-point rolling window ASE for model comparison.

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
plotts.sample.wge(df$volume, arlimits=T)

## $autplt
##  [1]  1.000000000  0.332654213 -0.015323504 -0.185197647  0.005818653
##  [6]  0.124607489  0.059217024 -0.058842730 -0.135327419 -0.128903104
## [11] -0.111554245 -0.136230829 -0.177090893 -0.102374463 -0.010295688
## [16] -0.012154962 -0.091622722 -0.032089912 -0.074665693 -0.032921963
## [21] -0.004581175  0.213392652  0.211975490  0.061996282 -0.059048222
## [26] -0.052510741
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
## [15] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28
## [29] 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42
## [43] 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  -7.38499578  -0.95161030   3.39895369   3.13204413   7.25816141
##  [6]  -3.34022975   1.45696680  -3.16420812   0.98653170  -0.01510512
## [11]   0.74899492  -3.54602921  -7.04898470   6.69701287   0.21917900
## [16]   3.26040620  -2.30071218   5.83627330   0.12105751   4.40156709
## [21]   1.31507769   1.49404141   4.21104386  -7.07865344 -14.93552255
## [26]  -1.90529423  -9.05704419   1.17291691  -2.44739663 -15.18817408
## [31]  -3.82537165  -8.97778373  -1.31847009  -8.52554281  -9.96514339
## [36]  -0.84871304  -2.04248679  -9.19693386  -5.37219836  -6.41835850
## [41]  -0.84106671 -15.69187726   0.78348200  -7.78153910 -19.81844091
## [46]  -1.92362217  -0.03109830  -1.67305303  -6.34494742  -2.36851032
## 
## $dbz
##  [1]  0.7617948  1.3396040  1.9342166  2.3252670  2.4227279  2.2272880
##  [7]  1.8074853  1.2989121  0.8916181  0.7572102  0.9377718  1.3279105
## [13]  1.7766211  2.1767262  2.4818412  2.6857904  2.7968223  2.8178953
## [19]  2.7371907  2.5301127  2.1698520  1.6414415  0.9557012  0.1601924
## [25] -0.6606149 -1.4101839 -2.0301647 -2.5352613 -2.9884786 -3.4384161
## [31] -3.8748905 -4.2313776 -4.4336135 -4.4602264 -4.3559544 -4.1882068
## [37] -4.0026903 -3.8148228 -3.6258049 -3.4384625 -3.2600843 -3.0965714
## [43] -2.9492483 -2.8191207 -2.7129243 -2.6423490 -2.6148787 -2.6237376
## [49] -2.6467693 -2.6582701
# First Variable has the "time shift"
ccf(df$volume,df$low, type="correlation") # nothing too significant to warrant lagging volume
ccf(df$HiLo,df$low, type="correlation") # no strong cross-correlation
ccf(df$OpClo,df$low, type="correlation") # Open-Close does not require lagging.
df2 <- df[1:(nrow(df)-5),]
VARselect(df2$low, lag.max=6, season=NULL, exogen=data.frame(df2$volume, df2$HiLo, df2$OpClo), type="const") # AIC picked p=1 with AIC -2.3235
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      2      2      5 
## 
## $criteria
##                  1           2           3           4          5
## AIC(n) -2.32345681 -2.42117979 -2.41376593 -2.40362716 -2.4365656
## HQ(n)  -2.26710299 -2.35355521 -2.33487058 -2.31346105 -2.3351287
## SC(n)  -2.18364578 -2.25340655 -2.21803048 -2.17992951 -2.1849057
## FPE(n)  0.09794606  0.08883496  0.08950683  0.09043349  0.0875214
##                  6
## AIC(n) -2.42032626
## HQ(n)  -2.30761863
## SC(n)  -2.14070419
## FPE(n)  0.08897736
lsfit <- VAR(y=cbind(df2$low, df2$volume, df2$HiLo, df2$OpClo), p=1, type="const") # with p=5, this will estimate the coefficients for lag 5
preds <- predict(lsfit, n.ahead=5)
ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$fcst$y1[,1])^2)
ASE # 0.2383062
## [1] 0.2383062
trainingSize = 91
horizon = 5
ASEHolder_2 = numeric()

for( i in 1:5)
{
  
  VARselect(df2$low[i:(i + (trainingSize - 1))], lag.max=6, season=NULL, exogen=data.frame(df2$volume[i:(i + (trainingSize - 1))], df2$HiLo[i:(i + (trainingSize - 1))], df2$OpClo[i:(i + (trainingSize - 1))]), type="const") # AIC picked p=1 with AIC -2.3235
  lsfit <- VAR(y=cbind(df2$low[i:(i + (trainingSize - 1))], df2$volume[i:(i + (trainingSize - 1))], df2$HiLo[i:(i + (trainingSize - 1))], df2$OpClo[i:(i + (trainingSize - 1))]), p=1, type="const")
  preds <- predict(lsfit, n.ahead=5)
 # forecasts = fore.arma.wge(midterm2020[i:(i + (trainingSize - 1))], phi=phis, theta=thetas, n.ahead=horizon, lastn=F, plot=F)
  ASE_2 = mean((df$low[(trainingSize + i):(trainingSize + i + (horizon) - 1)] - preds$fcst$y1[,1])^2)
  
  ASEHolder_2[i] = ASE_2
  
}

WindowedASE_mod2 = mean(ASEHolder_2)
WindowedASE_mod2 # 0.3102587
## [1] 0.3745635
plot(seq(1,100,1), df$low[1:100], type = "l", xlim = c(0,100), xlab = "Trading Days", ylab = "Trade Prices", main = "Vector Autoregressive Modeling without Trend", lwd=2)
lines(seq(96,100,1), preds$fcst$y1[,1], type = "l", col = "red", lwd=2)

Vector Autoregressive Modeling with Trend

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
plotts.sample.wge(df$volume, arlimits=T)

## $autplt
##  [1]  1.000000000  0.332654213 -0.015323504 -0.185197647  0.005818653
##  [6]  0.124607489  0.059217024 -0.058842730 -0.135327419 -0.128903104
## [11] -0.111554245 -0.136230829 -0.177090893 -0.102374463 -0.010295688
## [16] -0.012154962 -0.091622722 -0.032089912 -0.074665693 -0.032921963
## [21] -0.004581175  0.213392652  0.211975490  0.061996282 -0.059048222
## [26] -0.052510741
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
## [15] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28
## [29] 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42
## [43] 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  -7.38499578  -0.95161030   3.39895369   3.13204413   7.25816141
##  [6]  -3.34022975   1.45696680  -3.16420812   0.98653170  -0.01510512
## [11]   0.74899492  -3.54602921  -7.04898470   6.69701287   0.21917900
## [16]   3.26040620  -2.30071218   5.83627330   0.12105751   4.40156709
## [21]   1.31507769   1.49404141   4.21104386  -7.07865344 -14.93552255
## [26]  -1.90529423  -9.05704419   1.17291691  -2.44739663 -15.18817408
## [31]  -3.82537165  -8.97778373  -1.31847009  -8.52554281  -9.96514339
## [36]  -0.84871304  -2.04248679  -9.19693386  -5.37219836  -6.41835850
## [41]  -0.84106671 -15.69187726   0.78348200  -7.78153910 -19.81844091
## [46]  -1.92362217  -0.03109830  -1.67305303  -6.34494742  -2.36851032
## 
## $dbz
##  [1]  0.7617948  1.3396040  1.9342166  2.3252670  2.4227279  2.2272880
##  [7]  1.8074853  1.2989121  0.8916181  0.7572102  0.9377718  1.3279105
## [13]  1.7766211  2.1767262  2.4818412  2.6857904  2.7968223  2.8178953
## [19]  2.7371907  2.5301127  2.1698520  1.6414415  0.9557012  0.1601924
## [25] -0.6606149 -1.4101839 -2.0301647 -2.5352613 -2.9884786 -3.4384161
## [31] -3.8748905 -4.2313776 -4.4336135 -4.4602264 -4.3559544 -4.1882068
## [37] -4.0026903 -3.8148228 -3.6258049 -3.4384625 -3.2600843 -3.0965714
## [43] -2.9492483 -2.8191207 -2.7129243 -2.6423490 -2.6148787 -2.6237376
## [49] -2.6467693 -2.6582701
# First Variable has the "time shift"
ccf(df$volume,df$low, type="correlation") # nothing too significant to warrant lagging volume
ccf(df$HiLo,df$low, type="correlation") # no strong cross-correlation
ccf(df$OpClo,df$low, type="correlation") # Open-Close does not require lagging.

df2 <- df[1:(nrow(df)-5),] 
t <- 1:100
# Because of the white noise cross-correlation between all exogenous variables and the response variable, it's reasonable to expect only an AR(1) from VARselect. The model will quickly approximate the signal's mean without much sensitivity to change.
VARselect(df2$low, lag.max=6, season=NULL, exogen=data.frame(t[1:95], df2$volume, df2$HiLo, df2$OpClo), type="const") # AIC:  p=1 with AIC -2.3158
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      2      2      2 
## 
## $criteria
##                 1           2           3           4           5
## AIC(n) -2.3288269 -2.49848113 -2.47601445 -2.45354442 -2.45473227
## HQ(n)  -2.2612024 -2.41958579 -2.38584835 -2.35210755 -2.34202465
## SC(n)  -2.1610537 -2.30274568 -2.25231680 -2.20188456 -2.17511021
## FPE(n)  0.0974299  0.08223654  0.08411857  0.08604793  0.08596807
##                  6
## AIC(n) -2.45522002
## HQ(n)  -2.33124163
## SC(n)  -2.14763575
## FPE(n)  0.08595343
lsfit <- VAR(y=cbind(df2$low, t[1:95], df2$volume, df2$HiLo, df2$OpClo), p=1, type="const") # with p=1, this will estimate the coefficients for lag 1
preds <- predict(lsfit, n.ahead=5)

ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$fcst$y1[,1])^2)
ASE # 0.1132188
## [1] 0.1132188
t = 1:100
trainingSize = 91
horizon = 5
ASEHolder_2 = numeric()

for( i in 1:5)
{
  
  VARselect(df2$low[i:(i + (trainingSize - 1))], lag.max=6, season=NULL, exogen=data.frame(t[i:(i + (trainingSize - 1))], df2$volume[i:(i + (trainingSize - 1))], df2$HiLo[i:(i + (trainingSize - 1))], df2$OpClo[i:(i + (trainingSize - 1))]), type="const") # AIC picked p=1 with AIC -2.3235
  lsfit <- VAR(y=cbind(df2$low[i:(i + (trainingSize - 1))], exogen = data.frame(t[i:(i + (trainingSize - 1))], df2$volume[i:(i + (trainingSize - 1))], df2$HiLo[i:(i + (trainingSize - 1))], df2$OpClo[i:(i + (trainingSize - 1))])), p=1, type="const")
  preds <- predict(lsfit, n.ahead=5)

  ASE_2 = mean((df$low[(trainingSize + i):(trainingSize + i + (horizon) - 1)] - preds$fcst$df2.low.i..i....trainingSize...1...[,1])^2)
  
  ASEHolder_2[i] = ASE_2
  
}

WindowedASE_mod2 = mean(ASEHolder_2)
WindowedASE_mod2 # 0.2139037
## [1] 0.1845027
plot(seq(1,100,1), df$low[1:100], type = "l",xlim = c(0,100), xlab = "Trading Days", ylab = "Trade Prices", main = "Vector Autoregressive Modeling with Trend", lwd=2)
lines(seq(96,100,1), preds$fcst$df2.low.i..i....trainingSize...1...[,1], type = "l", col = "red", lwd=2)

Vector Autoregressive Modeling with Trend, Time and Volume Interaction

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
# First Variable has the "time shift"
ccf(df$volume,df$low, type="correlation") # nothing too significant to warrant lagging volume

ccf(df$HiLo,df$low, type="correlation") # no strong cross-correlation

ccf(df$OpClo,df$low, type="correlation") # Open-Close does not require lagging.

df2 <- df[1:(nrow(df)-5),]
t <- 1:100
# Because of the white noise cross-correlation between all exogenous variables and the response variable, it's reasonable to expect only an AR(1) from VARselect. The model will quickly approximate the signal's mean without much sensitivity to change.
#VARselect(df2$low, lag.max=6, type="const", season=NULL, exogen=data.frame(t[96:100], df2$volume, df2$HiLo, df2$OpClo, t[96:100]*df2$volume)) # AIC:  p=1 with AIC -2.3158

# trend added manually:
VARselect(df2$low, lag.max=6, season=NULL, exogen=data.frame(t[1:95], df2$volume, df2$HiLo, df2$OpClo, t[1:95]*df2$volume), type="const") # AIC:  p=1 with AIC -2.3081
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      2      2      2 
## 
## $criteria
##                  1           2         3           4           5
## AIC(n) -2.30801515 -2.47800080 -2.455533 -2.43306769 -2.43298718
## HQ(n)  -2.22911981 -2.38783470 -2.354096 -2.32036006 -2.30900879
## SC(n)  -2.11227971 -2.25430315 -2.203873 -2.15344563 -2.12540291
## FPE(n)  0.09949085  0.08395165  0.085877  0.08785085  0.08788582
##                  6
## AIC(n) -2.43521003
## HQ(n)  -2.29996088
## SC(n)  -2.09966355
## FPE(n)  0.08772417
# trend added manually:
lsfit <- VAR(y=cbind(df2$low, t[1:95], df2$volume, df2$HiLo, df2$OpClo, t[1:95]*df2$volume), p=1, type="const") # with p=1, this will estimate the coefficients for lag 1
preds <- predict(lsfit, n.ahead=5)

ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$fcst$y1[,1])^2)
ASE # 0.1296242
## [1] 0.1296242
t = 1:100
trainingSize = 91
horizon = 5
ASEHolder_2 = numeric()

for( i in 1:5)
{
  
  VARselect(df2$low[i:(i + (trainingSize - 1))], lag.max=6, season=NULL, exogen=data.frame(t[i:(i + (trainingSize - 1))], df2$volume[i:(i + (trainingSize - 1))], df2$HiLo[i:(i + (trainingSize - 1))], df2$OpClo[i:(i + (trainingSize - 1))], ((df2$volume[i:(i + (trainingSize - 1))]) * t[i:(i + (trainingSize - 1))])), type="const") # AIC picked p=2 with AIC -2.44854735
  lsfit <- VAR(y=cbind(df2$low[i:(i + (trainingSize - 1))], exogen = data.frame(t[i:(i + (trainingSize - 1))], df2$volume[i:(i + (trainingSize - 1))], df2$HiLo[i:(i + (trainingSize - 1))], df2$OpClo[i:(i + (trainingSize - 1))], ((df2$volume[i:(i + (trainingSize - 1))]) * t[i:(i + (trainingSize - 1))]))), p=1, type="const")
  preds <- predict(lsfit, n.ahead=5)

  ASE_2 = mean((df$low[(trainingSize + i):(trainingSize + i + (horizon) - 1)] - preds$fcst$df2.low.i..i....trainingSize...1...[,1])^2)
  
  ASEHolder_2[i] = ASE_2
  
}

WindowedASE_mod2 = mean(ASEHolder_2)
WindowedASE_mod2 # 0.1869798
## [1] 0.1869798
plot(seq(1,100,1), df$low[1:100], type = "l",xlim = c(0,100), xlab = "Trading Days", ylab = "Trade Prices", main = "Vector Autoregressive Modeling with Trend", lwd=2)
lines(seq(96,100,1), preds$fcst$df2.low.i..i....trainingSize...1...[,1], type = "l", col = "red", lwd=2)

Vector Autoregressive Modeling with Trend, Time and Volume Interaction, Time and OpClo Interaction

It is important to note that we did not identify an interaction between time and HiLo to benefit the model, but we did find that interaction with OpClo and time to be beneficial for reducing ASE.

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
plotts.sample.wge(df$volume, arlimits=T)

## $autplt
##  [1]  1.000000000  0.332654213 -0.015323504 -0.185197647  0.005818653
##  [6]  0.124607489  0.059217024 -0.058842730 -0.135327419 -0.128903104
## [11] -0.111554245 -0.136230829 -0.177090893 -0.102374463 -0.010295688
## [16] -0.012154962 -0.091622722 -0.032089912 -0.074665693 -0.032921963
## [21] -0.004581175  0.213392652  0.211975490  0.061996282 -0.059048222
## [26] -0.052510741
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
## [15] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28
## [29] 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42
## [43] 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  -7.38499578  -0.95161030   3.39895369   3.13204413   7.25816141
##  [6]  -3.34022975   1.45696680  -3.16420812   0.98653170  -0.01510512
## [11]   0.74899492  -3.54602921  -7.04898470   6.69701287   0.21917900
## [16]   3.26040620  -2.30071218   5.83627330   0.12105751   4.40156709
## [21]   1.31507769   1.49404141   4.21104386  -7.07865344 -14.93552255
## [26]  -1.90529423  -9.05704419   1.17291691  -2.44739663 -15.18817408
## [31]  -3.82537165  -8.97778373  -1.31847009  -8.52554281  -9.96514339
## [36]  -0.84871304  -2.04248679  -9.19693386  -5.37219836  -6.41835850
## [41]  -0.84106671 -15.69187726   0.78348200  -7.78153910 -19.81844091
## [46]  -1.92362217  -0.03109830  -1.67305303  -6.34494742  -2.36851032
## 
## $dbz
##  [1]  0.7617948  1.3396040  1.9342166  2.3252670  2.4227279  2.2272880
##  [7]  1.8074853  1.2989121  0.8916181  0.7572102  0.9377718  1.3279105
## [13]  1.7766211  2.1767262  2.4818412  2.6857904  2.7968223  2.8178953
## [19]  2.7371907  2.5301127  2.1698520  1.6414415  0.9557012  0.1601924
## [25] -0.6606149 -1.4101839 -2.0301647 -2.5352613 -2.9884786 -3.4384161
## [31] -3.8748905 -4.2313776 -4.4336135 -4.4602264 -4.3559544 -4.1882068
## [37] -4.0026903 -3.8148228 -3.6258049 -3.4384625 -3.2600843 -3.0965714
## [43] -2.9492483 -2.8191207 -2.7129243 -2.6423490 -2.6148787 -2.6237376
## [49] -2.6467693 -2.6582701
# First Variable has the "time shift"
ccf(df$volume,df$low, type="correlation") # nothing too significant to warrant lagging volume
ccf(df$HiLo,df$low, type="correlation") # no strong cross-correlation
ccf(df$OpClo,df$low, type="correlation") # Open-Close does not require lagging.

df2 <- df[1:(nrow(df)-5),]
t <- 1:100
# Because of the white noise cross-correlation between all exogenous variables and the response variable, it's reasonable to expect only an AR(1) from VARselect. The model will quickly approximate the signal's mean without much sensitivity to change.

# find a good count for p for the AR(p) component of the vector autoregressive model. No indication above for lag was required, but :
VARselect(df2$low, lag.max=8, season=NULL, exogen=data.frame(t[1:95], df2$volume, df2$HiLo, df2$OpClo, t[1:95]*df2$volume, t[1:95]*df2$OpClo), type="const") # selects p=1 as the best value for AR(p)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      2      2      2 
## 
## $criteria
##                 1           2           3           4           5
## AIC(n) -2.2713589 -2.46459045 -2.44187917 -2.41889398 -2.41500677
## HQ(n)  -2.1800536 -2.36187199 -2.32774754 -2.29334919 -2.27804882
## SC(n)  -2.0446087 -2.20949651 -2.15844146 -2.10711249 -2.07488151
## FPE(n)  0.1032257  0.08510686  0.08708604  0.08914122  0.08952502
##                  6           7           8
## AIC(n) -2.41422092 -2.40044609 -2.37762703
## HQ(n)  -2.26584981 -2.24066181 -2.20642959
## SC(n)  -2.04575189 -2.00363329 -1.95247046
## FPE(n)  0.08963886  0.09093375  0.09309378
# Build the model using the p identified for AR(p) and the training data. type="const" because trend is included
lsfit <- VAR(y=cbind(df2$low, t[1:95], df2$volume, df2$HiLo, df2$OpClo, t[1:95]*df2$volume, t[1:95]*df2$OpClo), p=1, type="const") # with p=1, this will estimate the coefficients for lag 1

preds <- predict(lsfit, n.ahead=5)

ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$fcst$y1[,1])^2)
ASE # 0.1703138
## [1] 0.1703138
t = 1:100
trainingSize = 91
horizon = 5
ASEHolder_2 = numeric()

for( i in 1:5)
{
  
  VARselect(df2$low[i:(i + (trainingSize - 1))], lag.max=6, season=NULL, exogen=data.frame(t[i:(i + (trainingSize - 1))], df2$volume[i:(i + (trainingSize - 1))], df2$HiLo[i:(i + (trainingSize - 1))], df2$OpClo[i:(i + (trainingSize - 1))], ((df2$OpClo[i:(i + (trainingSize - 1))]) * t[i:(i + (trainingSize - 1))]),((df2$volume[i:(i + (trainingSize - 1))]) * t[i:(i + (trainingSize - 1))])), type="const")
  
  lsfit <- VAR(y=cbind(df2$low[i:(i + (trainingSize - 1))], exogen = data.frame(t[i:(i + (trainingSize - 1))], df2$volume[i:(i + (trainingSize - 1))], df2$HiLo[i:(i + (trainingSize - 1))], df2$OpClo[i:(i + (trainingSize - 1))], ((df2$volume[i:(i + (trainingSize - 1))]) * t[i:(i + (trainingSize - 1))])), ((df2$OpClo[i:(i + (trainingSize - 1))]) * t[i:(i + (trainingSize - 1))])), p=1, type="const")
  preds <- predict(lsfit, n.ahead=5)

  ASE_2 = mean((df$low[(trainingSize + i):(trainingSize + i + (horizon) - 1)] - preds$fcst$df2.low.i..i....trainingSize...1...[1])^2)
  
  ASEHolder_2[i] = ASE_2
  
}

WindowedASE_mod2 = mean(ASEHolder_2)
WindowedASE_mod2 # 0.2288478
## [1] 0.2288478
plot(seq(1,100,1), df$low[1:100], type = "l",xlim = c(0,100), xlab = "Trading Days", ylab = "Trade Prices", main = "Vector Autoregressive Modeling with Trend", lwd=2)
lines(seq(96,100,1), preds$fcst$df2.low.i..i....trainingSize...1...[,1], type = "l", col = "red", lwd=2)

Neural Network Models

Neural Network without Trend

Because prices can change quickly from day-to-day and there is considerable time required to calculate rolling averages for competing neural networks (training time is extensive), we did not provide a rolling window ASE.

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
# First Variable has the "time shift"
ccf(df$volume,df$low, type="correlation") # nothing too significant to warrant lagging volume

ccf(df$HiLo,df$low, type="correlation") # no strong cross-correlation

ccf(df$OpClo,df$low, type="correlation") # Open-Close does not require lagging.

dfTrain <- df[1:(nrow(df)-5),]
lowTest <- df$low[(nrow(df)-4):nrow(df)] # forecast actuals
lowTrain <- ts(dfTrain$low) # a training set of the target minus the test forecast
volumeFull <- ts(df$volume) # the full dataset (train + test) of xreg
HiLoFull <- ts(df$HiLo) # the full dataset (train + test) of xreg
OpCloFull <- ts(df$OpClo) # the full dataset (train + test) of xreg

fit.mlp <- mlp(lowTrain, xreg=data.frame(volumeFull, HiLoFull, OpCloFull), reps = 15, comb = "mean", hd.auto.type="cv")
fit.mlp
## MLP fit with 2 hidden nodes and 15 repetitions.
## Univariate lags: (1,2)
## 2 regressors included.
## - Regressor 1 lags: (1,2)
## - Regressor 2 lags: (1,2,4)
## Forecast combined using the mean operator.
## MSE: 0.0675.
plot(fit.mlp)

fore.mlp <- forecast(fit.mlp, h=5, xreg=data.frame(volumeFull, HiLoFull, OpCloFull))
plot(fore.mlp)

ASE = mean((lowTest - fore.mlp$mean)^2)
ASE # 0.1758194
## [1] 0.2051909
t = 1:100
trainingSize = 95
horizon = 5
ASEHolder_2 = numeric()

Neural Network with Trend

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
t <- 1:nrow(df)
# First Variable has the "time shift"
ccf(df$volume,df$low, type="correlation") # nothing too significant to warrant lagging volume

ccf(df$HiLo,df$low, type="correlation") # no strong cross-correlation

ccf(df$OpClo,df$low, type="correlation") # Open-Close does not require lagging.

dfTrain <- df[1:(nrow(df)-5),]
lowTest <- df$low[(nrow(df)-4):nrow(df)] # forecast actuals
lowTrain <- ts(dfTrain$low) # a training set of the target minus the test forecast
volumeFull <- ts(df$volume) # the full dataset (train + test) of xreg
HiLoFull <- ts(df$HiLo) # the full dataset (train + test) of xreg
OpCloFull <- ts(df$OpClo) # the full dataset (train + test) of xreg

fit.mlp <- mlp(lowTrain, xreg=data.frame(t, volumeFull, HiLoFull, OpCloFull), reps = 15, comb = "mean", hd.auto.type="cv")
fit.mlp
## MLP fit with 3 hidden nodes and 15 repetitions.
## Univariate lags: (1,2)
## 3 regressors included.
## - Regressor 1 lags: (1)
## - Regressor 2 lags: (2)
## - Regressor 3 lags: (1,2)
## Forecast combined using the mean operator.
## MSE: 0.0679.
plot(fit.mlp)

fore.mlp <- forecast(fit.mlp, h=5, xreg=data.frame(t, volumeFull, HiLoFull, OpCloFull))
plot(fore.mlp)

ASE = mean((lowTest - fore.mlp$mean)^2)
ASE # 0.06421939
## [1] 0.06402857

Neural Network with Trend, Time and Volume Interaction

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
t <- 1:nrow(df)
# First Variable has the "time shift"
ccf(df$volume,df$low, type="correlation") # nothing too significant to warrant lagging volume

ccf(df$HiLo,df$low, type="correlation") # no strong cross-correlation

ccf(df$OpClo,df$low, type="correlation") # Open-Close does not require lagging.

dfTrain <- df[1:(nrow(df)-5),]
lowTest <- df$low[(nrow(df)-4):nrow(df)] # forecast actuals
lowTrain <- ts(dfTrain$low) # a training set of the target minus the test forecast
volumeFull <- ts(df$volume) # the full dataset (train + test) of xreg
HiLoFull <- ts(df$HiLo) # the full dataset (train + test) of xreg
OpCloFull <- ts(df$OpClo) # the full dataset (train + test) of xreg

fit.mlp <- mlp(lowTrain, xreg=data.frame(t, volumeFull, HiLoFull, OpCloFull, t*volumeFull), reps = 15, comb = "mean", hd.auto.type="cv")
fit.mlp
## MLP fit with 1 hidden node and 15 repetitions.
## Univariate lags: (1,2)
## 3 regressors included.
## - Regressor 1 lags: (1)
## - Regressor 2 lags: (2)
## - Regressor 3 lags: (1,2)
## Forecast combined using the mean operator.
## MSE: 0.0837.
plot(fit.mlp)

fore.mlp <- forecast(fit.mlp, h=5, xreg=data.frame(t, volumeFull, HiLoFull, OpCloFull, t*volumeFull))
plot(fore.mlp)

ASE = mean((lowTest - fore.mlp$mean)^2)
ASE # 0.05486796
## [1] 0.06318744

Neural Network with Trend, Time and Volume Interaction, Time and OpClo Interaction

It is important to note that we did not identify an interaction between time and HiLo to benefit the model, but we did find that interaction with OpClo and time to be beneficial for reducing ASE.

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
t <- 1:nrow(df)

# First Variable has the "time shift"
ccf(df$volume,df$low, type="correlation") # nothing too significant to warrant lagging volume

ccf(df$HiLo,df$low, type="correlation") # no strong cross-correlation

ccf(df$OpClo,df$low, type="correlation") # Open-Close does not require lagging.

dfTrain <- df[1:(nrow(df)-5),]
lowTest <- df$low[(nrow(df)-4):nrow(df)] # forecast actuals
lowTrain <- ts(dfTrain$low) # a training set of the target minus the test forecast
volumeFull <- ts(df$volume) # the full dataset (train + test) of xreg
HiLoFull <- ts(df$HiLo) # the full dataset (train + test) of xreg
High <- ts(df$high)
OpCloFull <- ts(df$OpClo) # the full dataset (train + test) of xreg

fit.mlp <- mlp(lowTrain, xreg=data.frame(t, volumeFull, High, OpCloFull, t*volumeFull, t*OpCloFull), reps = 15, comb = "mean", hd.auto.type="cv", difforder=NULL)
fit.mlp
## MLP fit with 7 hidden nodes and 15 repetitions.
## Univariate lags: (1)
## 3 regressors included.
## - Regressor 1 lags: (1)
## - Regressor 2 lags: (2)
## - Regressor 3 lags: (1,2)
## Forecast combined using the mean operator.
## MSE: 0.0615.
plot(fit.mlp)

fore.mlp <- forecast(fit.mlp, h=5, xreg=data.frame(t, volumeFull, High, OpCloFull, t*volumeFull, t*OpCloFull))
plot(fore.mlp)

ASE = mean((lowTest - fore.mlp$mean)^2)
ASE # 0.05711815
## [1] 0.05149625
df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
#MLP
ts = df$low
batch_size = 91
start = 1
num_batches = length(ts)-batch_size+1
ASEs = numeric(num_batches)
dfTrain <- df[1:(nrow(df)-5),]
lowTest <- df$low[(nrow(df)-4):nrow(df)] # forecast actuals
lowTrain <- ts(dfTrain$low) # a training set of the target minus the test forecast
volumeFull <- ts(df$volume) # the full dataset (train + test) of xreg
HiLoFull <- ts(df$HiLo) # the full dataset (train + test) of xreg
OpCloFull <- ts(df$OpClo) # the full dataset (train + test) of xreg

#Rolling ASE
for (i in 0:(num_batches-1))
{
  fit.mlp <- mlp(ts(ts[i:(i+(batch_size-1))]), xreg=data.frame(volumeFull[i:(i+(batch_size-1))], HiLoFull[i:(i+(batch_size-1))], OpCloFull[i:(i+(batch_size-1))]), reps = 15, comb = "mean", hd.auto.type="cv")
  fore.mlp <- forecast(fit.mlp, h=5, xreg=data.frame(volumeFull, HiLoFull, OpCloFull))
  ASEs[i+1] = mean((ts[(length(ts) - 4):length(ts)] - fore.mlp$mean)^2)
  start = start+1
}
ASEs
##  [1] 0.2277213 0.2140041 0.9741110 1.3021377 0.9929071 0.7479621 0.4694016
##  [8] 0.2220813 0.3978031 0.1986235
mean(ASEs)
## [1] 0.5746753
signoise.forecast <- fore.sigplusnoise.wge(df$low, max.p=2, n.ahead=5, limits=T, lastn=T, plot=T)
## 
## Coefficients of Original polynomial:  
## 1.0507 -0.3152 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0507B+0.3152B^2    1.6667+-0.6283i      0.5614       0.0574
##   
## 

SIGNOISE_ASE = mean((df$low[(nrow(df)-5+1):nrow(df)] - signoise.forecast$f)^2)
SIGNOISE_ASE
## [1] 0.133713
#Signal + Noise
df2 <- df[1:(nrow(df)-5),]
ts = df$low
t2 = 1:95
batch_size = 91
start = 1
t = 100
num_batches = length(ts)-batch_size+1
ASEs = numeric(num_batches)

for (i in 0:(num_batches-1))
{
  
  signoise.forecast <- fore.sigplusnoise.wge(df$low[i:(i+(batch_size-1))], max.p=2, n.ahead=5, limits=T, lastn=T, plot=T)

  ASEs[i+1] = mean((df$low[(nrow(df)-4):nrow(df)] - signoise.forecast$f)^2)
}
## 
## Coefficients of Original polynomial:  
## 1.1449 -0.4350 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.1449B+0.4350B^2    1.3161+-0.7529i      0.6595       0.0827
##   
## 

## 
## Coefficients of Original polynomial:  
## 1.1440 -0.4314 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.1440B+0.4314B^2    1.3260+-0.7482i      0.6568       0.0818
##   
## 

## 
## Coefficients of Original polynomial:  
## 1.0167 -0.3128 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0167B+0.3128B^2    1.6252+-0.7456i      0.5593       0.0685
##   
## 

## 
## Coefficients of Original polynomial:  
## 1.0780 -0.3853 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0780B+0.3853B^2    1.3988+-0.7991i      0.6208       0.0826
##   
## 

## 
## Coefficients of Original polynomial:  
## 1.0997 -0.4007 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0997B+0.4007B^2    1.3724+-0.7825i      0.6330       0.0825
##   
## 

## 
## Coefficients of Original polynomial:  
## 1.0963 -0.3898 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0963B+0.3898B^2    1.4063+-0.7667i      0.6243       0.0794
##   
## 

## 
## Coefficients of Original polynomial:  
## 1.1033 -0.3869 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.1033B+0.3869B^2    1.4258+-0.7427i      0.6220       0.0764
##   
## 

## 
## Coefficients of Original polynomial:  
## 1.1125 -0.3945 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.1125B+0.3945B^2    1.4102+-0.7392i      0.6281       0.0768
##   
## 

## 
## Coefficients of Original polynomial:  
## 1.0418 -0.3213 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0418B+0.3213B^2    1.6210+-0.6960i      0.5669       0.0646
##   
## 

## 
## Coefficients of Original polynomial:  
## 1.0612 -0.3355 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0612B+0.3355B^2    1.5813+-0.6926i      0.5793       0.0657
##   
## 

ASEs
##  [1] 0.08252439 0.10633312 0.06495049 0.20670093 0.14328694 0.10872694
##  [7] 0.12299478 0.21005774 0.08122475 0.08532700
mean(ASEs)
## [1] 0.1212127
#ARIMA
ts = df$low
batch_size = 91
start = 1
num_batches = length(ts)-batch_size+1
ASEs = numeric(num_batches)

for (i in 0:(num_batches-1))
{
  forecasts = fore.aruma.wge(ts[start:(batch_size+i)], phi = c(0.1227,-.1183,.1454,-.2587,.029), d = 1, n.ahead = 5, lastn = TRUE, limits = FALSE)
  ASEs[i+1] = mean((ts[(length(ts) - 4):length(ts)] - forecasts$f)^2)
}

ASEs
##  [1] 0.17345445 0.20945347 0.66851333 0.93863985 0.10974201 0.09601358
##  [7] 0.87826516 0.87427621 0.39701013 0.11582064
mean(ASEs)
## [1] 0.4461189
#Ensemble
mlp_pred = as.numeric(fore.mlp$mean)
signoise = as.numeric(signoise.forecast$f)
en_pred = (mlp_pred + signoise)/2
mean((ts[(length(ts) - 4):length(ts)] - en_pred)^2)
## [1] 0.07126488